{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "BioBot_FDS_YB_02_KNN_Model_181114.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    }
  },
  "cells": [
    {
      "metadata": {
        "id": "IjyaQqpewlW4",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# BioBot_FDS_02: KNN_Model\n",
        "## Deliverable_02: Implementing a K-Nearest Neightbors (KNN) Classifier model\n",
        "Author/code developer: Yan Bello. 14/11/2018. As part of the Master in Artificial Intelligence (UNIR). \n",
        "This file/code is part of the development and exploration/experimentation on a Fall Detection System (FDS). \n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "In the following sections, we used this dataset: \n",
        "SisFall: A Fall and Movement Dataset. \n",
        "Created by: A. Sucerquia, J.D. López, J.F. Vargas-Bonilla\n",
        "SISTEMIC, Faculty of Engineering, Universidad de Antiquia UDEA.\n",
        "Detailed information about this dataset can be found in this website: http://sistemic.udea.edu.co/en/investigacion/proyectos/english-falls/.\n",
        "Reference paper: Sucerquia A, López JD, Vargas-Bonilla JF. SisFall: A Fall and Movement Dataset. Sensors (Basel). 2017;17(1):198. Published 2017 Jan 20. doi:10.3390/s17010198\n",
        "\n",
        "---\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "3zwtaJ0RwiwC",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 54
        },
        "outputId": "b3f8827c-2054-4ae2-f776-68add8dbed65"
      },
      "cell_type": "code",
      "source": [
        "# Preliminary step 0. We need to establish/select our working folders. First, ensure  the previous dataset files are available.\n",
        "# The code below is prepared to work with two options: local drive or mounting a Google Drive for Colab\n",
        "# Select the appropriate configuration for your environment by commenting/un-commenting the following lines:\n",
        "\n",
        "# To work with Google Colab and Google Drive: \n",
        "from google.colab import drive \n",
        "drive.mount('/content/gdrive')\n",
        "FILE_DIRECTORY =  \"gdrive/My Drive/Colab Notebooks/\"\n",
        "SisFall_ALL_DIRECTORY =  FILE_DIRECTORY + \"SisFall_dataset_ALL/\"\n",
        "\n",
        "# To work with a local drive, uncomment these line:\n",
        "# FILE_DIRECTORY =  os.getcwd() + \"\\\\\"\n",
        "# SisFall_ALL_DIRECTORY =  FILE_DIRECTORY + \"SisFall_dataset_ALL\\\\\""
      ],
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount(\"/content/gdrive\", force_remount=True).\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "_uBcnFPnxamQ",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## 2.1 Load a dataframe with prepared info from ADL/Falls dataset"
      ]
    },
    {
      "metadata": {
        "id": "swM4SErNeXmu",
        "colab_type": "code",
        "outputId": "ff9ad7cd-8488-43e0-9908-04473c6b1da5",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1018
        }
      },
      "cell_type": "code",
      "source": [
        "# We work with the prepared file Unified_ADL_Falls, which is based on the previous dataset\n",
        "my_data_file_name = FILE_DIRECTORY + \"Unified_ADL_Falls.txt\"\n",
        "\n",
        "import pandas as pd\n",
        "\n",
        "# Creamos un data frame y cargamos los datos del fichero\n",
        "df_ADL_Falls = pd.DataFrame(pd.read_csv(my_data_file_name, sep = ','))\n",
        "\n",
        "df_ADL_Falls.drop('0', axis=1, inplace=True)\n",
        "\n",
        "df_only_ADLs = df_ADL_Falls[df_ADL_Falls.Fall_ADL == \"D\"]\n",
        "df_only_Falls = df_ADL_Falls[df_ADL_Falls.Fall_ADL == \"F\"]\n",
        "\n",
        "# mostramos el data frame\n",
        "print(df_only_ADLs.tail())\n",
        "print(df_only_Falls.tail())"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "     Act_Type Age_Cat Fall_ADL              File  kurtosis_S1_X  max_S1_X  \\\n",
            "2697      D19      SE        D  D19_SE06_R01.txt       8.727956       190   \n",
            "2698      D19      SE        D  D19_SE06_R02.txt      10.096698        86   \n",
            "2699      D19      SE        D  D19_SE06_R03.txt       9.540330       259   \n",
            "2700      D19      SE        D  D19_SE06_R04.txt      20.191198       393   \n",
            "2701      D19      SE        D  D19_SE06_R05.txt       9.022231       230   \n",
            "\n",
            "      mean_S1_X  min_S1_X  range_S1_X  skewness_S1_X    ...     \\\n",
            "2697  20.204659      -195         385      -1.745292    ...      \n",
            "2698 -33.031614      -324         410      -1.976282    ...      \n",
            "2699   8.276206      -154         413       0.398760    ...      \n",
            "2700   9.514143      -255         648       0.993127    ...      \n",
            "2701   9.554077      -164         394      -0.149056    ...      \n",
            "\n",
            "      range_S1_N_VER  skewness_S1_N_VER  std_S1_N_VER  var_S1_N_VER   corr_HV  \\\n",
            "2697        1.531165           2.009740      0.189131      0.035770  0.711721   \n",
            "2698        1.721676           2.839037      0.226830      0.051452  0.865803   \n",
            "2699        1.651294           2.986164      0.175513      0.030805  0.747053   \n",
            "2700        2.525731           4.938333      0.230644      0.053197  0.787008   \n",
            "2701        2.110528           4.118197      0.203275      0.041321  0.796733   \n",
            "\n",
            "       corr_NH   corr_NV   corr_XY   corr_XZ   corr_YZ  \n",
            "2697  0.996825  0.739432 -0.309286 -0.145757  0.702846  \n",
            "2698  0.995884  0.886033  0.752237  0.536177  0.744701  \n",
            "2699  0.998071  0.765374 -0.409197 -0.213251  0.695071  \n",
            "2700  0.998731  0.800299 -0.157456 -0.203438  0.688019  \n",
            "2701  0.998789  0.806128 -0.476529 -0.166258  0.717775  \n",
            "\n",
            "[5 rows x 58 columns]\n",
            "     Act_Type Age_Cat Fall_ADL              File  kurtosis_S1_X  max_S1_X  \\\n",
            "4495      F15      SE        F  F15_SE06_R01.txt      12.015061       175   \n",
            "4496      F15      SE        F  F15_SE06_R02.txt      14.164169       128   \n",
            "4497      F15      SE        F  F15_SE06_R03.txt      10.210935        13   \n",
            "4498      F15      SE        F  F15_SE06_R04.txt       9.538224       137   \n",
            "4499      F15      SE        F  F15_SE06_R05.txt      11.740748       154   \n",
            "\n",
            "       mean_S1_X  min_S1_X  range_S1_X  skewness_S1_X    ...     \\\n",
            "4495 -150.477537      -977        1152      -2.804444    ...      \n",
            "4496 -155.198003      -911        1039      -3.036554    ...      \n",
            "4497 -160.153078      -753         766      -2.682646    ...      \n",
            "4498 -171.712146      -834         971      -2.509883    ...      \n",
            "4499 -147.943428      -944        1098      -2.826632    ...      \n",
            "\n",
            "      range_S1_N_VER  skewness_S1_N_VER  std_S1_N_VER  var_S1_N_VER   corr_HV  \\\n",
            "4495        3.650464           3.294931      0.514555      0.264767 -0.065147   \n",
            "4496        3.290032           4.064083      0.373755      0.139693 -0.055799   \n",
            "4497        2.751549           3.714880      0.397491      0.157999 -0.028757   \n",
            "4498        3.240474           3.509039      0.452012      0.204315 -0.054726   \n",
            "4499        3.904351           3.534182      0.499249      0.249249  0.155856   \n",
            "\n",
            "       corr_NH   corr_NV   corr_XY   corr_XZ   corr_YZ  \n",
            "4495  0.328154  0.881449 -0.082799 -0.530395  0.460717  \n",
            "4496  0.210786  0.946213 -0.056746 -0.469133  0.677548  \n",
            "4497  0.265980  0.930266 -0.240292 -0.541105  0.675271  \n",
            "4498  0.275312  0.925184 -0.341458 -0.624081  0.762479  \n",
            "4499  0.521532  0.902481 -0.197174 -0.673646  0.731832  \n",
            "\n",
            "[5 rows x 58 columns]\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "GBiW3tap1263",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "### Shuffle and set up training and test samples for ADL/Falls"
      ]
    },
    {
      "metadata": {
        "id": "kk3EZrikeXm4",
        "colab_type": "code",
        "outputId": "7b17c011-2cbb-4e28-e821-94c4d58c0f8f",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "import random\n",
        "import math\n",
        "from numpy.random import permutation\n",
        "\n",
        "# Randomly shuffle the index of each set (ADLs and Falls)\n",
        "# -------------------------------------------------------\n",
        "# First we prepare the sets of ADLs\n",
        "random_indices = permutation(df_only_ADLs.index)\n",
        "# Use a test-split (of 30% of the items)\n",
        "test_split = math.floor(len(df_only_ADLs)*0.3)\n",
        "# Test set with 30% of items\n",
        "df_only_ADLs_test = df_only_ADLs.loc[random_indices[0:test_split]]\n",
        "# Train set with 70% of the items.\n",
        "df_only_ADLs_train = df_only_ADLs.loc[random_indices[test_split:]]\n",
        "\n",
        "\n",
        "# -------------------------------------------------------\n",
        "# Now we prepare the sets of Falls\n",
        "random_indices = permutation(df_only_Falls.index)\n",
        "# Use a test-split (of 30% of the items)\n",
        "test_split = math.floor(len(df_only_Falls)*0.3)\n",
        "# Test set with 30% of items\n",
        "df_only_Falls_test = df_only_Falls.loc[random_indices[0:test_split]]\n",
        "# Train set with 70% of the items.\n",
        "df_only_Falls_train = df_only_Falls.loc[random_indices[test_split:]]\n",
        "\n",
        "\n",
        "\n",
        "print(\"Total ADL: \" + str(len(df_only_ADLs)))\n",
        "print(\"Total Falls: \" + str(len(df_only_Falls)))\n",
        "print(\"GRAND Total: \" + str(len(df_only_Falls)+len(df_only_ADLs)))\n",
        "print(\"---------------------------------------\")\n",
        "print(\"Train Falls: \"+ str(len(df_only_Falls_train)))\n",
        "print(\"Train ADL: \"+ str(len(df_only_ADLs_train)))\n",
        "print(\"Train TOTAL: \"+ str(len(df_only_ADLs_train)+len(df_only_Falls_train)))\n",
        "print(\"---------------------------------------\")\n",
        "print(\"Test Falls: \"+ str(len(df_only_Falls_test)))\n",
        "print(\"Test ADL: \"+ str(len(df_only_ADLs_test)))\n",
        "print(\"Test TOTAL: \"+ str(len(df_only_ADLs_test)+len(df_only_Falls_test)))"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Total ADL: 2702\n",
            "Total Falls: 1798\n",
            "GRAND Total: 4500\n",
            "---------------------------------------\n",
            "Train Falls: 1259\n",
            "Train ADL: 1892\n",
            "Train TOTAL: 3151\n",
            "---------------------------------------\n",
            "Test Falls: 539\n",
            "Test ADL: 810\n",
            "Test TOTAL: 1349\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "XHEHeHINeXm_",
        "colab_type": "code",
        "outputId": "dff7afec-a51f-48dc-b470-d642b6729ace",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "# Prepare dataset with Test examplars\n",
        "\n",
        "frames = [df_only_Falls_test, df_only_ADLs_test]\n",
        "df_ADL_Falls_test = pd.concat(frames)\n",
        "print(\"Test ADLs: \"+ str(len(df_only_ADLs_test)))\n",
        "print(\"Test Falls: \"+ str(len(df_only_Falls_test)))\n",
        "print(\"Test ALL: \"+ str(len(df_ADL_Falls_test)))\n",
        "\n",
        "print(df_ADL_Falls_test.head())\n",
        "print(df_ADL_Falls_test.tail())\n"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Test ADLs: 810\n",
            "Test Falls: 539\n",
            "Test ALL: 1349\n",
            "     Act_Type Age_Cat Fall_ADL              File  kurtosis_S1_X  max_S1_X  \\\n",
            "3477      F07      SA        F  F07_SA12_R02.txt      28.158868      1730   \n",
            "4496      F15      SE        F  F15_SE06_R02.txt      14.164169       128   \n",
            "3031      F03      SA        F  F03_SA19_R01.txt      88.093106       191   \n",
            "4384      F15      SA        F  F15_SA01_R05.txt      64.593266      2619   \n",
            "3719      F09      SA        F  F09_SA12_R04.txt       2.035594        19   \n",
            "\n",
            "       mean_S1_X  min_S1_X  range_S1_X  skewness_S1_X    ...     \\\n",
            "3477   63.324459      -510        2240       3.887497    ...      \n",
            "4496 -155.198003      -911        1039      -3.036554    ...      \n",
            "3031 -146.509151     -4096        4287      -8.141281    ...      \n",
            "4384 -159.004992     -4053        6672      -5.225071    ...      \n",
            "3719 -147.504160      -835         854      -1.118020    ...      \n",
            "\n",
            "      range_S1_N_VER  skewness_S1_N_VER  std_S1_N_VER  var_S1_N_VER   corr_HV  \\\n",
            "3477        6.699628           4.023581      0.718885      0.516795  0.846591   \n",
            "4496        3.290032           4.064083      0.373755      0.139693 -0.055799   \n",
            "3031       16.388624           9.088421      1.221342      1.491676  0.690857   \n",
            "4384       21.553022           8.303136      1.562807      2.442366  0.820121   \n",
            "3719        3.256540           1.456533      0.498986      0.248987  0.088200   \n",
            "\n",
            "       corr_NH   corr_NV   corr_XY   corr_XZ   corr_YZ  \n",
            "3477  0.950524  0.919384 -0.668190 -0.089681  0.438163  \n",
            "4496  0.210786  0.946213 -0.056746 -0.469133  0.677548  \n",
            "3031  0.866353  0.953690  0.424994 -0.329616 -0.443368  \n",
            "4384  0.892859  0.978143  0.305849 -0.657534 -0.152340  \n",
            "3719  0.695683  0.748694 -0.216643 -0.377517  0.326644  \n",
            "\n",
            "[5 rows x 58 columns]\n",
            "     Act_Type Age_Cat Fall_ADL              File  kurtosis_S1_X  max_S1_X  \\\n",
            "757       D08      SA        D  D08_SA22_R05.txt       8.159998       100   \n",
            "652       D08      SA        D  D08_SA01_R05.txt       1.755796        45   \n",
            "1326      D11      SE        D  D11_SE01_R04.txt       3.757317        63   \n",
            "2627      D19      SA        D  D19_SA10_R01.txt       7.738473       170   \n",
            "2576      D18      SA        D  D18_SA23_R05.txt       4.126170       427   \n",
            "\n",
            "      mean_S1_X  min_S1_X  range_S1_X  skewness_S1_X    ...     \\\n",
            "757   15.016639       -38         138       1.151278    ...      \n",
            "652    4.602329       -39          84      -0.146848    ...      \n",
            "1326  -3.467554       -23          86       1.358530    ...      \n",
            "2627   7.294509      -142         312      -0.059787    ...      \n",
            "2576   3.487521      -472         899       0.165259    ...      \n",
            "\n",
            "      range_S1_N_VER  skewness_S1_N_VER  std_S1_N_VER  var_S1_N_VER   corr_HV  \\\n",
            "757         1.391013           1.726104      0.229784      0.052801  0.442195   \n",
            "652         1.234381           2.736062      0.138825      0.019272  0.294266   \n",
            "1326        1.787610           1.850547      0.251917      0.063462  0.411575   \n",
            "2627        1.310244           2.305588      0.182099      0.033160  0.653727   \n",
            "2576        2.997666           3.221152      0.381603      0.145621  0.606482   \n",
            "\n",
            "       corr_NH   corr_NV   corr_XY   corr_XZ   corr_YZ  \n",
            "757   0.998951  0.450273  0.012475  0.136234  0.205409  \n",
            "652   0.999974  0.296621  0.139385  0.288089  0.025525  \n",
            "1326  0.999976  0.412518  0.376528 -0.715585 -0.269799  \n",
            "2627  0.999824  0.658250 -0.083154 -0.110563  0.321832  \n",
            "2576  0.992261  0.669341  0.127315 -0.201458 -0.042817  \n",
            "\n",
            "[5 rows x 58 columns]\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "GjXFNdmfeXnE",
        "colab_type": "code",
        "outputId": "cef0ee0a-afb9-4883-f733-3e674a885081",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "# Prepare dataset with Train examplars\n",
        "\n",
        "frames = [df_only_Falls_train, df_only_ADLs_train]\n",
        "df_ADL_Falls_train = pd.concat(frames)\n",
        "print(\"train ADLs: \"+ str(len(df_only_ADLs_train)))\n",
        "print(\"train Falls: \"+ str(len(df_only_Falls_train)))\n",
        "print(\"train ALL: \"+ str(len(df_ADL_Falls_train)))\n",
        "\n",
        "print(df_ADL_Falls_train.head())\n",
        "print(df_ADL_Falls_train.tail())\n"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "train ADLs: 1892\n",
            "train Falls: 1259\n",
            "train ALL: 3151\n",
            "     Act_Type Age_Cat Fall_ADL              File  kurtosis_S1_X  max_S1_X  \\\n",
            "4149      F13      SA        F  F13_SA02_R05.txt       4.165379        53   \n",
            "3575      F08      SA        F  F08_SA07_R05.txt       4.422467       205   \n",
            "4172      F13      SA        F  F13_SA07_R03.txt       2.065487       221   \n",
            "3303      F06      SA        F  F06_SA01_R03.txt      12.473522       893   \n",
            "3400      F06      SA        F  F06_SA20_R05.txt      10.054013      1396   \n",
            "\n",
            "       mean_S1_X  min_S1_X  range_S1_X  skewness_S1_X    ...     \\\n",
            "4149 -147.003328      -918         971      -1.356286    ...      \n",
            "3575 -143.532446      -989        1194      -1.547840    ...      \n",
            "4172  -81.241265      -463         684      -0.420039    ...      \n",
            "3303  129.995008       -88         981       2.952118    ...      \n",
            "3400  123.337770      -565        1961       2.090652    ...      \n",
            "\n",
            "      range_S1_N_VER  skewness_S1_N_VER  std_S1_N_VER  var_S1_N_VER   corr_HV  \\\n",
            "4149        3.616831           1.759811      0.505234      0.255262 -0.053625   \n",
            "3575        4.529287           3.027514      0.653364      0.426884  0.325869   \n",
            "4172        3.010339           2.310889      0.410026      0.168122  0.889051   \n",
            "3303        3.467718           2.645097      0.484077      0.234330  0.475571   \n",
            "3400        5.562395           2.550464      0.684755      0.468889  0.503541   \n",
            "\n",
            "       corr_NH   corr_NV   corr_XY   corr_XZ   corr_YZ  \n",
            "4149  0.340929  0.825944 -0.536538 -0.287334  0.329780  \n",
            "3575  0.693146  0.887493 -0.712236  0.105727  0.335286  \n",
            "4172  0.979584  0.917976 -0.355061  0.163495 -0.478949  \n",
            "3303  0.894396  0.784597 -0.051477  0.412850  0.346640  \n",
            "3400  0.894028  0.806098 -0.058769  0.037300  0.524654  \n",
            "\n",
            "[5 rows x 58 columns]\n",
            "     Act_Type Age_Cat Fall_ADL              File  kurtosis_S1_X  max_S1_X  \\\n",
            "560       D07      SA        D  D07_SA21_R03.txt       9.899222        62   \n",
            "2468      D18      SA        D  D18_SA02_R02.txt       3.125303       276   \n",
            "1550      D12      SE        D  D12_SE08_R03.txt      -0.017980       312   \n",
            "855       D09      SA        D  D09_SA04_R03.txt       3.529780        56   \n",
            "1835      D14      SE        D  D14_SE03_R03.txt      -1.453733       278   \n",
            "\n",
            "       mean_S1_X  min_S1_X  range_S1_X  skewness_S1_X    ...     \\\n",
            "560     2.242928       -25          87       1.670363    ...      \n",
            "2468   15.718802      -193         469       0.742017    ...      \n",
            "1550  211.229617        49         263      -1.128621    ...      \n",
            "855    16.936772       -38          94       0.352948    ...      \n",
            "1835  113.946755       -48         326       0.460365    ...      \n",
            "\n",
            "      range_S1_N_VER  skewness_S1_N_VER  std_S1_N_VER  var_S1_N_VER   corr_HV  \\\n",
            "560         0.573939          -0.044662      0.203851      0.041555  0.255089   \n",
            "2468        2.169816           2.911109      0.275573      0.075941  0.579090   \n",
            "1550        0.425406          -0.336909      0.079033      0.006246 -0.890491   \n",
            "855         0.342374           0.518233      0.084930      0.007213  0.535318   \n",
            "1835        1.793753           0.070799      0.207965      0.043250  0.079973   \n",
            "\n",
            "       corr_NH   corr_NV   corr_XY   corr_XZ   corr_YZ  \n",
            "560   0.999342  0.264373  0.236076  0.002666 -0.427624  \n",
            "2468  0.997391  0.607893 -0.258679 -0.011191  0.089100  \n",
            "1550 -0.827260  0.949986  0.928715 -0.942368 -0.929078  \n",
            "855   0.999046  0.523658 -0.339200  0.047038  0.444544  \n",
            "1835  0.139818  0.950952 -0.220681 -0.766857 -0.049420  \n",
            "\n",
            "[5 rows x 58 columns]\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "colab_type": "text",
        "id": "gvdG8GzIyQN_"
      },
      "cell_type": "markdown",
      "source": [
        "## 2.2 Define and train a K-Neighbors Classifiers\n",
        "Below we use KNeighborsClassifier from sklearn.neighbors, experimenting with various parameter settings. For clarity and simplicity here only two model configuration are included."
      ]
    },
    {
      "metadata": {
        "id": "rOWAmqmiy1a_",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "### 2.2-A) K-Neighbors Classifier with default parameters\n",
        "#### The KNN model"
      ]
    },
    {
      "metadata": {
        "id": "rCrswqKleXnI",
        "colab_type": "code",
        "outputId": "3b087742-6f1a-422b-9e75-13d881b040f6",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "# The columns that we will be making predictions with.\n",
        "x_columns = ['kurtosis_S1_X','max_S1_X','mean_S1_X','min_S1_X','range_S1_X','skewness_S1_X','std_S1_X','var_S1_X',\n",
        "             'kurtosis_S1_Y','max_S1_Y','mean_S1_Y','min_S1_Y','range_S1_Y','skewness_S1_Y','std_S1_Y','var_S1_Y',\n",
        "             'kurtosis_S1_Z','max_S1_Z','mean_S1_Z','min_S1_Z','range_S1_Z','skewness_S1_Z','std_S1_Z','var_S1_Z',\n",
        "             'kurtosis_S1_N_XYZ','max_S1_N_XYZ','mean_S1_N_XYZ','min_S1_N_XYZ','range_S1_N_XYZ','skewness_S1_N_XYZ','std_S1_N_XYZ','var_S1_N_XYZ',\n",
        "             'kurtosis_S1_N_HOR','max_S1_N_HOR','mean_S1_N_HOR','min_S1_N_HOR','range_S1_N_HOR','skewness_S1_N_HOR','std_S1_N_HOR','var_S1_N_HOR',\n",
        "             'kurtosis_S1_N_VER','max_S1_N_VER','mean_S1_N_VER','min_S1_N_VER','range_S1_N_VER','skewness_S1_N_VER','std_S1_N_VER','var_S1_N_VER',\n",
        "             'corr_HV','corr_NH','corr_NV','corr_XY','corr_XZ','corr_YZ']\n",
        "# The column that we want to predict.\n",
        "y_column = [\"Fall_ADL\"]\n",
        "\n",
        "from sklearn.neighbors import KNeighborsClassifier\n",
        "# Create the knn model.\n",
        "# Look at the five closest neighbors.\n",
        "knn = KNeighborsClassifier(n_neighbors=5)\n",
        "# Fit the model on the training data.\n",
        "y = df_ADL_Falls_train.loc[:,['Fall_ADL']]\n",
        "train_y = np.array(y)\n",
        "\n",
        "knn.fit(df_ADL_Falls_train[x_columns], train_y.ravel())\n",
        "# Make point predictions on the test set using the fit model.\n",
        "predictions = knn.predict(df_ADL_Falls_test[x_columns])\n",
        "\n",
        "print(predictions)\n",
        "print(knn.score(df_ADL_Falls_test[x_columns], df_ADL_Falls_test[y_column]))"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "['F' 'D' 'F' ... 'D' 'D' 'D']\n",
            "0.9636767976278725\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "Bw0vAnagzBR4",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "#### Confusion matrix for K-Neighbors Classifier with default parameters"
      ]
    },
    {
      "metadata": {
        "id": "vI-KwMlueXnO",
        "colab_type": "code",
        "outputId": "ec39e6a6-dc09-40e5-d3bb-7462b828fc3a",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import confusion_matrix\n",
        "cm = confusion_matrix(df_ADL_Falls_test[y_column], predictions, labels=[\"D\", \"F\"])\n",
        "print(\"Confusion Matrix:\")\n",
        "print(\"-----------------\")\n",
        "print(cm)\n",
        "print(\"-----------------\")\n",
        "cm_norm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n",
        "print(\"Confusion Matrix (Normalized):\")\n",
        "print(\"-----------------------------\")\n",
        "print(cm_norm)\n",
        "print(\"-----------------------------\")"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Confusion Matrix:\n",
            "-----------------\n",
            "[[794  16]\n",
            " [ 33 506]]\n",
            "-----------------\n",
            "Confusion Matrix (Normalized):\n",
            "-----------------------------\n",
            "[[0.98024691 0.01975309]\n",
            " [0.06122449 0.93877551]]\n",
            "-----------------------------\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "id": "va95CuG1zSdu",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "#### Sensitivity, Specificity, Precision and Accuracy"
      ]
    },
    {
      "metadata": {
        "id": "mFOaQbS_eXnV",
        "colab_type": "code",
        "outputId": "b06c9911-db04-42fc-c94b-b3bb01b2f87a",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "# calculations of measurements of performance\n",
        "\n",
        "knn_TP = cm[1,1]\n",
        "knn_FP = cm[1,0]\n",
        "knn_TN = cm[0,0]\n",
        "knn_FN = cm[0,1]\n",
        "\n",
        "# SENSITIVITY = TP / (TP + FN)\n",
        "knn_Sensitivity = knn_TP / (knn_TP + knn_FN)\n",
        "print(\"knn_Sensitivity = \"+ str(knn_Sensitivity))\n",
        "\n",
        "# SPECIFICITY = TN / (FP + TN)\n",
        "knn_Specificity = knn_TN / (knn_FP + knn_TN)\n",
        "print(\"knn_Specificity = \"+ str(knn_Specificity))\n",
        "\n",
        "# Precision = TP / (TP + FP)\n",
        "knn_Precision = knn_TP / (knn_TP + knn_FP)\n",
        "print(\"knn_Precision = \"+ str(knn_Precision))\n",
        "\n",
        "# Accuracy = (TP + TN) / (TP + FP + TN + FN)\n",
        "knn_Accuracy = (knn_TP + knn_TN) / (knn_TP + knn_FP + knn_TN + knn_FN)\n",
        "print(\"knn_Accuracy = \"+ str(knn_Accuracy))"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "knn_Sensitivity = 0.9693486590038314\n",
            "knn_Specificity = 0.9600967351874244\n",
            "knn_Precision = 0.9387755102040817\n",
            "knn_Accuracy = 0.9636767976278725\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "colab_type": "text",
        "id": "5lxUj1xX0GSX"
      },
      "cell_type": "markdown",
      "source": [
        "### 2.2-B) K-Neighbors Classifier with K = 10\n",
        "#### The KNN model"
      ]
    },
    {
      "metadata": {
        "id": "3Ls5GezPeXna",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "\"\"\"\n",
        "After experimenting and trying out various parameter configurations, K=10 gave the best performance\n",
        "KNN MODEL WITH K = 10\n",
        "\n",
        "\"\"\"\n",
        "\n",
        "# The columns that we will be making predictions with.\n",
        "x_columns = ['kurtosis_S1_X','max_S1_X','mean_S1_X','min_S1_X','range_S1_X','skewness_S1_X','std_S1_X','var_S1_X',\n",
        "             'kurtosis_S1_Y','max_S1_Y','mean_S1_Y','min_S1_Y','range_S1_Y','skewness_S1_Y','std_S1_Y','var_S1_Y',\n",
        "             'kurtosis_S1_Z','max_S1_Z','mean_S1_Z','min_S1_Z','range_S1_Z','skewness_S1_Z','std_S1_Z','var_S1_Z',\n",
        "             'kurtosis_S1_N_XYZ','max_S1_N_XYZ','mean_S1_N_XYZ','min_S1_N_XYZ','range_S1_N_XYZ','skewness_S1_N_XYZ','std_S1_N_XYZ','var_S1_N_XYZ',\n",
        "             'kurtosis_S1_N_HOR','max_S1_N_HOR','mean_S1_N_HOR','min_S1_N_HOR','range_S1_N_HOR','skewness_S1_N_HOR','std_S1_N_HOR','var_S1_N_HOR',\n",
        "             'kurtosis_S1_N_VER','max_S1_N_VER','mean_S1_N_VER','min_S1_N_VER','range_S1_N_VER','skewness_S1_N_VER','std_S1_N_VER','var_S1_N_VER',\n",
        "             'corr_HV','corr_NH','corr_NV','corr_XY','corr_XZ','corr_YZ']\n",
        "# The column that we want to predict.\n",
        "y_column = [\"Fall_ADL\"]\n",
        "\n",
        "from sklearn.neighbors import KNeighborsClassifier\n",
        "# Create the knn model.\n",
        "# Look at the five closest neighbors.\n",
        "knn = KNeighborsClassifier(n_neighbors=15)\n",
        "# Fit the model on the training data.\n",
        "y = df_ADL_Falls_train.loc[:,['Fall_ADL']]\n",
        "train_y = np.array(y)\n",
        "\n",
        "knn.fit(df_ADL_Falls_train[x_columns], train_y.ravel())\n",
        "# Make point predictions on the test set using the fit model.\n",
        "predictions = knn.predict(df_ADL_Falls_test[x_columns])\n",
        "\n",
        "print(predictions)\n",
        "print(knn.score(df_ADL_Falls_test[x_columns], df_ADL_Falls_test[y_column]))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "colab_type": "text",
        "id": "U42pNhTE0dNv"
      },
      "cell_type": "markdown",
      "source": [
        "#### Confusion matrix for K-Neighbors Classifier with K=10"
      ]
    },
    {
      "metadata": {
        "id": "IXgg65RNeXnf",
        "colab_type": "code",
        "outputId": "061d1bcb-d256-4b11-daa2-c3a88ac5387f",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import confusion_matrix\n",
        "cm = confusion_matrix(df_ADL_Falls_test[y_column], predictions, labels=[\"D\", \"F\"])\n",
        "print(\"Confusion Matrix:\")\n",
        "print(\"-----------------\")\n",
        "print(cm)\n",
        "print(\"-----------------\")\n",
        "cm_norm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n",
        "print(\"Confusion Matrix (Normalized):\")\n",
        "print(\"-----------------------------\")\n",
        "print(cm_norm)\n",
        "print(\"-----------------------------\")"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Confusion Matrix:\n",
            "-----------------\n",
            "[[794  16]\n",
            " [ 33 506]]\n",
            "-----------------\n",
            "Confusion Matrix (Normalized):\n",
            "-----------------------------\n",
            "[[0.98024691 0.01975309]\n",
            " [0.06122449 0.93877551]]\n",
            "-----------------------------\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "colab_type": "text",
        "id": "1fXDaEPR0mun"
      },
      "cell_type": "markdown",
      "source": [
        "#### Sensitivity, Specificity, Precision and Accuracy"
      ]
    },
    {
      "metadata": {
        "id": "SZJyd-0oeXnl",
        "colab_type": "code",
        "outputId": "855b0cb0-4b1b-4d3d-d532-138d102a4118",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "# calculations of measurements of performance\n",
        "\n",
        "knn_TP = cm[1,1]\n",
        "knn_FP = cm[1,0]\n",
        "knn_TN = cm[0,0]\n",
        "knn_FN = cm[0,1]\n",
        "\n",
        "# SENSITIVITY = TP / (TP + FN)\n",
        "knn_Sensitivity = knn_TP / (knn_TP + knn_FN)\n",
        "print(\"knn_Sensitivity = \"+ str(knn_Sensitivity))\n",
        "\n",
        "# SPECIFICITY = TN / (FP + TN)\n",
        "knn_Specificity = knn_TN / (knn_FP + knn_TN)\n",
        "print(\"knn_Specificity = \"+ str(knn_Specificity))\n",
        "\n",
        "# Precision = TP / (TP + FP)\n",
        "knn_Precision = knn_TP / (knn_TP + knn_FP)\n",
        "print(\"knn_Precision = \"+ str(knn_Precision))\n",
        "\n",
        "# Accuracy = (TP + TN) / (TP + FP + TN + FN)\n",
        "knn_Accuracy = (knn_TP + knn_TN) / (knn_TP + knn_FP + knn_TN + knn_FN)\n",
        "print(\"knn_Accuracy = \"+ str(knn_Accuracy))"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "knn_Sensitivity = 0.9693486590038314\n",
            "knn_Specificity = 0.9600967351874244\n",
            "knn_Precision = 0.9387755102040817\n",
            "knn_Accuracy = 0.9636767976278725\n"
          ],
          "name": "stdout"
        }
      ]
    }
  ]
}